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Differentiated Services (Diffserv) and Real-Time Communication 
Abstract 


This memo describes the interaction between Differentiated Services 
(Diffserv) network quality-of-service (QoS) functionality and real- 
time network communication, including communication based on the 
Real-time Transport Protocol (RTP). Diffserv is based on network 
nodes applying different forwarding treatments to packets whose IP 
headers are marked with different Diffserv Codepoints (DSCPs). 

WebRTC applications, as well as some conferencing applications, have 
begun using the Session Description Protocol (SDP) bundle negotiation 
mechanism to send multiple traffic streams with different QoS 
requirements using the same network 5-tuple. The results of using 
multiple DSCPs to obtain different QoS treatments within a single 
network 5-tuple have transport protocol interactions, particularly 
with congestion control functionality (e.g., reordering). In 
addition, DSCP markings may be changed or removed between the traffic 
source and destination. This memo covers the implications of these 
Diffserv aspects for real-time network communication, including 
WebRTC. 
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Les 


Introduction 


This memo describes the interactions between Differentiated Services 
(Diffserv) network quality-of-service (QoS) functionality [RFC2475] 
and real-time network communication, including communication based on 
the Real-time Transport Protocol (RTP) [RFC3550]. Diffserv is based 
on network nodes applying different forwarding treatments to packets 
whose IP headers are marked with different Diffserv Codepoints 

(DSCPs) [RFC2474]. In the past, distinct RTP streams have been sent 
over different transport-level flows, sometimes multiplexed with the 
RTP Control Protocol (RTCP). WebRTC applications, as well as some 
conferencing applications, are now using the Session Description 
Protocol (SDP) [RFC4566] bundle negotiation mechanism [SDP-BUNDLE] to 
send multiple traffic streams with different QoS requirements using 
the same network 5-tuple. The results of using multiple DSCPs to 
obtain different QoS treatments within a single network 5-tuple have 
transport protocol interactions, particularly with congestion control 
functionality (e.g., reordering). In addition, DSCP markings may be 
changed or removed between the traffic source and destination. This 
memo covers the implications of these Diffserv aspects for real-time 
network communication, including WebRTC traffic [WEBRTC-OVERVIEW]. 


The memo is organized as follows. Background is provided in 

Section 2 on real-time communications and Section 3 on Differentiated 
Services. Section 4 describes some examples of Diffserv usage with 
real-time communications. Section 5 explains how use of Diffserv 
features interacts with both transport and real-time communications 
protocols and Section 6 provides guidance on Diffserv feature usage 
to control undesired interactions. Security considerations are 
discussed in Section 7. 


Real-Time Communications 


Real-time communications enables communication in real time over an 
IP network using voice, video, text, content sharing, etc. It is 
possible to use more than one of these modes concurrently to provide 
a rich communication experience. 


A simple example of real-time communications is a voice call placed 
over the Internet where an audio stream is transmitted in each 
direction between two users. A more complex example is an immersive 
videoconferencing system that has multiple video screens, multiple 
cameras, multiple microphones, and some means of sharing content. 
For such complex systems, there may be multiple media and non-media 
streams transmitted via a single IP address and port or via multiple 
IP addresses and ports. 
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2.1. RTP Background 


The most common protocol used for real-time media is RTP [RFC3550]. 
RTP defines a common encapsulation format and handling rules for 
real-time data transmitted over the Internet. Unfortunately, RTP 
terminology usage has been inconsistent. For example, RFC 7656 
[RFC7656] on RTP terminology observes that: 


RTP [RFC3550] uses media stream, audio stream, video stream, anda 
stream of (RTP) packets interchangeably, which are all RTP 
streams. 


Terminology in this memo is based on that RIP terminology document 
with the following terms being of particular importance (see that 
terminology document for full definitions): 


Source Stream: A reference clock synchronized, time progressing, 
digital media stream. 


RTP Stream: A stream of RTP packets containing media data, which may 
be source data or redundant data. The RTP stream is identified by 
an RIP synchronization source (SSRC) belonging to a particular RTP 
session. An RTP stream may be a secured RTP stream when RTP-based 
security is used. 


In addition, this memo follows [RFC3550] in using the term "SSRC" to 
designate both the identifier of an RTP stream and the entity that 
sends that RTP stream. 


Media encoding and packetization of a source stream results ina 
source RTP stream plus zero or more redundancy RIP streams that 
provide resilience against loss of packets from the source RTP stream 


[RFC7656]. Redundancy information may also be carried in the same 
RTP stream as the encoded source stream, e.g., see Section 7.2 of 
[RFC5109]. With most applications, a single media type (e.g., audio) 


is transmitted within a single RTP session. However, it is possible 
to transmit multiple, distinct source streams over the same RTP 
session as one or more individual RTP streams. This is referred to 
as RTP multiplexing. In addition, an RTP stream may contain multiple 
source streams, e.g., components or programs in an MPEG Transport 
Stream [H.221]. 


The number of source streams and RTP streams in an overall real-time 
interaction can be surprisingly large. In addition to a voice source 
stream and a video source stream, there could be separate source 
streams for each of the cameras or microphones on a videoconferencing 
system. As noted above, there might also be separate redundancy RTP 
streams that provide protection to a source RTP stream, using 
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techniques such as forward error correction. Another example is 
simulcast transmission, where a video source stream can be 
transmitted as high resolution and low resolution RTP streams at the 
same time. In this case, a media processing function might choose to 
send one or both RTP streams onward to a receiver based on bandwidth 
availability or who the active speaker is in a multipoint conference. 
Lastly, a transmitter might send the same media content concurrently 
as two RTP streams using different encodings (e.g., video encoded as 
VP8 [RFC6386] in parallel with H.264 [H.264]) to allow a media 
processing function to select a media encoding that best matches the 
capabilities of the receiver. 


For the WebRTC protocol suite [WEBRTC-TRANSPORTS], an individual 
source stream is a MediaStreamTrack, and a MediaStream contains one 
or more MediaStreamTracks [W3C.WD-mediacapture-streams—20130903]. A 
MediaStreamTrack is transmitted as a source RIP stream plus zero or 
more redundant RTP streams, so a MediaStream that consists of one 
MediaStreamTrack is transmitted as a single source RTP stream plus 
zero or more redundant RTP streams. For more information on use of 
RTP in WebRTC, see [RTP-USAGE]. 


RTP is usually carried over a datagram protocol, such as UDP 
[RFC768], UDP-Lite [RFC3828], or the Datagram Congestion Control 
Protocol (DCCP) [RFC4340]; UDP is most commonly used, but a non- 
datagram protocol (e.g., TCP [RFC793]) may also be used. Transport 
protocols other than UDP or UDP-Lite may also be used to transmit 
real-time data or near-real-time data. For example, the Stream 
Control Transmission Protocol (SCTP) [RFC4960] can be utilized to 
carry application-sharing or whiteboarding information as part of an 
overall interaction that includes real-time media. These additional 
transport protocols can be multiplexed with an RTP session via UDP 
encapsulation, thereby using a single pair of UDP ports. 


The WebRTC protocol suite encompasses a number of forms of 


multiplexing: 
1. Individual source streams are carried in one or more individual 
RTP streams. These RIP streams can be multiplexed onto a single 


transport-layer flow or sent as separate transport-layer flows. 
This memo only considers the case where the RIP streams are to be 
multiplexed onto a single transport-layer flow, forming a single 
RTP session as described in [RFC3550]; 


2. RTCP (see [RFC3550]) may be multiplexed onto the same transport- 
layer flow as the RTP streams with which it is associated, as 
described in [RFC5761], or it may be sent on a separate 
transport-layer flow; 
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3. An RTP session could be multiplexed with a single SCTP 
association over Datagram Transport Layer Security (DTLS) and 
with both Session Traversal Utilities for NAT (STUN) [RFC5389] 
and TURN [RFC5766] traffic into a single transport-layer flow as 
described in [RFC5764] with the updates in [SRTP-DTLS]. The STUN 
[RFC5389] and Traversal Using Relays around NAT (TURN) [RFC5766] 
protocols provide NAT/FW (Network Address Translator / Firewall) 
traversal and port mapping. 


The resulting transport-layer flow is identified by a network 
5-tuple, i.e., a combination of two IP addresses (source and 
destination), two ports (source and destination), and the transport 
protocol used (e.g., UDP). SDP bundle negotiation restrictions 
[SDP-BUNDLE] limit WebRTC to using at most a single DTLS session per 
network 5-tuple. In contrast to WebRTC use of a single SCTP 
association with DTLS, multiple SCTP associations can be directly 
multiplexed over a single UDP 5-tuple as specified in [RFC6951]. 


The STUN and TURN protocols were originally designed to use UDP as a 
transport; however, TURN has been extended to use TCP as a transport 
for situations in which UDP does not work [RFC6062]. When TURN 
selects use of TCP, the entire real-time communications session is 
carried over a single TCP connection (i.e., 5-tuple). 


For IPv6, addition of the flow label [RFC6437] to network 5-tuples 
results in network 6-tuples (or 7-tuples for bidirectional flows), 
but in practice, use of a flow label is unlikely to result ina 
finer-grain traffic subset than the corresponding network 5-tuple 
(e.g., the flow label is likely to represent the combination of two 
ports with use of the UDP protocol). For that reason, discussion in 
this document focuses on UDP 5-tuples. 


2.2. RIP Multiplexing 


Section 2.1 explains how source streams can be multiplexed ina 
single RTP session, which can in turn be multiplexed over UDP with 
packets generated by other transport protocols. This section 
provides background on why this level of multiplexing is desirable. 
The rationale in this section applies both to multiplexing of source 
streams in a single RTP session and multiplexing of an RTP session 
with traffic from other transport protocols via UDP encapsulation. 


Multiplexing reduces the number of ports utilized for real-time and 
related communication in an overall interaction. While a single 
endpoint might have plenty of ports available for communication, this 
traffic often traverses points in the network that are constrained on 
the number of available ports or whose performance degrades as the 
number of ports in use increases. A good example is a NAT/FW device 
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sitting at the network edge. As the number of simultaneous protocol 
sessions increases, so does the burden placed on these devices to 
provide port mapping. 


Another reason for multiplexing is to help reduce the time required 
to establish bidirectional communication. Since any two 
communicating users might be situated behind different NAT/FW 
devices, it is necessary to employ techniques like STUN and TURN 
along with Interactive Connectivity Establishment (ICE) [RFC5245] to 
get traffic to flow between the two devices [WEBRTC-TRANSPORTS]. 
Performing the tasks required by these protocols takes time, 
especially when multiple protocol sessions are involved. While tasks 
for different sessions can be performed in parallel, it is 
nonetheless necessary for applications to wait for all sessions to be 
opened before communication between two users can begin. Reducing 
the number of STUN/ICE/TURN steps reduces the likelihood of loss of a 
packet for one of these protocols; any such loss adds delay to 
setting up a communication session. Further, reducing the number of 
STUN/ICE/TURN tasks places a lower burden on the STUN and TURN 
servers. 


Multiplexing may reduce the complexity and resulting load on an 
endpoint. A single instance of STUN/ICE/TURN is simpler to execute 
and manage than multiple instances STUN/ICE/TURN operations happening 
in parallel, as the latter require synchronization and create more 
complex failure situations that have to be cleaned up by additional 
code. 


3. Differentiated Services (Diffserv) 


The Diffserv architecture [RFC2475] [RFC4594] is intended to enable 
scalable service discrimination in the Internet without requiring 
each node in the network to store per-flow state and participate in 
per-flow signaling. The services may be end to end or within a 
network; they include both those that can satisfy quantitative 
performance requirements (e.g., peak bandwidth) and those based on 
relative performance (e.g., "class" differentiation). Services can 
be constructed by a combination of well-defined building blocks 
deployed in network nodes that: 


o classify traffic and set bits in an IP header field at network 
boundaries or hosts, 


o use those bits to determine how packets are forwarded by the nodes 
inside the network, and 


o condition the marked packets at network boundaries in accordance 
with the requirements or rules of each service. 
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Traffic conditioning may include changing the DSCP in a packet 
(remarking it), delaying the packet (as a consequence of traffic 
shaping), or dropping the packet (as a consequence of traffic 
policing). 


A network node that supports Diffserv includes a classifier that 
selects packets based on the value of the DS field in IP headers (the 
Diffserv codepoint or DSCP), along with buffer management and packet 
scheduling mechanisms capable of delivering the specific packet 
forwarding treatment indicated by the DS field value. Setting of the 
DS field and fine-grain conditioning of marked packets need only be 
performed at network boundaries; internal network nodes operate on 
traffic aggregates that share a DS field value, or in some cases, a 
small set of related values. 


The Diffserv architecture [RFC2475] maintains distinctions among: 
o the QoS service provided to a traffic aggregate, 


o the conditioning functions and per-hop behaviors (PHBs) used to 
realize services, 


o the DSCP in the IP header used to mark packets to select a per-hop 
behavior, and 


o the particular implementation mechanisms that realize a per-hop 
behavior. 


This memo focuses on PHBs and the usage of DSCPs to obtain those 
behaviors. In a network node’s forwarding path, the DSCP is used to 
map a packet to a particular forwarding treatment, or to a per-hop 
behavior (PHB) that specifies the forwarding treatment. 


The specification of a PHB describes the externally observable 
forwarding behavior of a network node for network traffic marked with 
a DSCP that selects that PHB. In this context, "forwarding behavior" 
is a general concept - for example, if only one DSCP is used for all 
traffic on a link, the observable forwarding behavior (e.g., loss, 
delay, jitter) will often depend only on the loading of the link. To 
obtain useful behavioral differentiation, multiple traffic subsets 
are marked with different DSCPs for different PHBs for which node 
resources such as buffer space and bandwidth are allocated. PHBs 
provide the framework for a Diffserv network node to allocate 
resources to traffic subsets, with network-scope Differentiated 
Services constructed on top of this basic hop-by-hop resource 
allocation mechanism. 
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The codepoints (DSCPs) may be chosen from a small set of fixed values 
(the class selector codepoints), from a set of recommended values 
defined in PHB specifications, or from values that have purely local 
meanings to a specific network that supports Diffserv; in general, 
packets may be forwarded across multiple such networks between source 
and destination. 


The mandatory DSCPs are the class selector codepoints as specified in 
[RFC2474]. The class selector codepoints (CSO-CS7) extend the 
deprecated concept of IP Precedence in the IPv4 header; three bits 
are added, so that the class selector DSCPs are of the form ’xxx000’. 
The all-zero DSCP (’000000’ or CS0) is always assigned to a Default 
PHB that provides best-effort forwarding behavior, and the remaining 
class selector codepoints are intended to provide relatively better 
per-hop-forwarding behavior in increasing numerical order, but: 


o A network endpoint cannot rely upon different class selector 
codepoints providing Differentiated Services via assignment to 
different PHBs, as adjacent class selector codepoints may use the 
same pool of resources on each network node in some networks. 
This generalizes to ranges of class selector codepoints, but with 
limits -- for example, CS6 and CS7 are often used for network 
control (e.g., routing) traffic [RFC4594] and hence are likely to 
provide better forwarding behavior under network load to 
prioritize network recovery from disruptions. There is no 
effective way for a network endpoint to determine which PHBs are 
selected by the class selector codepoints on a specific network, 
let alone end to end. 


o CS1 (’001000’) was subsequently designated as the recommended 
codepoint for the Lower Effort (LE) PHB [RFC3662]. An LE service 
forwards traffic with "lower" priority than best effort and can be 
"starved" by best-effort and other "higher" priority traffic. Not 
all networks offer an LE service, hence traffic marked with the 
CS1 DSCP may not receive lower effort forwarding; such traffic may 
be forwarded with a different PHB (e.g., the Default PHB), 
remarked to another DSCP (e.g., CS0) and forwarded accordingly, or 
dropped. A network endpoint cannot rely upon the presence of an 
LE service that is selected by the CS1 DSCP on a specific network, 
let alone end to end. Packets marked with the CS1 DSCP may be 
forwarded with best-effort service or another "higher" priority 
service; see [RFC2474]. See [RFC3662] for further discussion of 
the LE PHB and service. 
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3.1. Diffserv Per-Hop Behaviors (PHBs) 


Although Differentiated Services is a general architecture that may 
be used to implement a variety of services, three fundamental 
forwarding behaviors (PHBs) have been defined and characterized for 
general use. These are: 


1. Default Forwarding (DF) for elastic traffic [RFC2474]. The 
Default PHB is always selected by the all-zero DSCP and provides 
best-effort forwarding. 


2. Assured Forwarding (AF) [RFC2597] to provide Differentiated 
Service to elastic traffic. Each instance of the AF behavior 
consists of three PHBs that differ only in drop precedence, e.g., 
AF11, AF12, and AF13; such a set of three AF PHBs is referred to 
as an AF class, e.g., AF1x. There are four defined AF classes, 
AF1x through AF4x, with higher numbered classes intended to 
receive better forwarding treatment than lower numbered classes. 
Use of multiple PHBs from a single AF class (e.g., AF1x) does not 
enable network traffic reordering within a single network 
5-tuple, although such reordering may occur for other transient 
reasons (e.g., routing changes or ECMP rebalancing). 


3. Expedited Forwarding (EF) [RFC3246] intended for inelastic 
traffic. Beyond the basic EF PHB, the VOICE-ADMIT PHB [RFC5865] 
is an admission-controlled variant of the EF PHB. Both of these 
PHBs are based on preconfigured limited forwarding capacity; 
traffic in excess of that capacity is expected to be dropped. 


3.2. Traffic Classifiers and DSCP Remarking 


DSCP markings are not end to end in general. Each network can make 
its own decisions about what PHBs to use and which DSCP maps to each 
PHB. While every PHB specification includes a recommended DSCP, and 
RFC 4594 [RFC4594] recommends their end-to-end usage, there is no 
requirement that every network support any PHBs (aside from the 
Default PHB for best-effort forwarding) or use any specific DSCPs, 
with the exception of the support requirements for the class selector 
codepoints (see RFC 2474 [RFC2474]). When Diffserv is used, the edge 
or boundary nodes of a network are responsible for ensuring that all 
traffic entering that network conforms to that network’s policies for 
DSCP and PHB usage, and such nodes may change DSCP markings on 
traffic to achieve that result. As a result, DSCP remarking is 
possible at any network boundary, including the first network node 
that traffic sent by a host encounters. Remarking is also possible 
within a network, e.g., for traffic shaping. 
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DSCP remarking is part of traffic conditioning; the traffic 
conditioning functionality applied to packets at a network node is 
determined by a traffic classifier [RFC2475]. Edge nodes of a 
Diffserv network classify traffic based on selected packet header 
fields; typical implementations do not look beyond the traffic’s 
network 5-tuple in the IP and transport protocol headers (e.g., for 
SCTP or RTP encapsulated in UDP, header-based classification is 
unlikely to look beyond the outer UDP header). As a result, when 
multiple DSCPs are used for traffic that shares a network 5-tuple, 
remarking at a network boundary may result in all of the traffic 
being forwarded with a single DSCP, thereby removing any 
differentiation within the network 5-tuple downstream of the 
remarking location. Network nodes within a Diffserv network 
generally classify traffic based solely on DSCPs, but may perform 
finer-grain traffic conditioning similar to that performed by edge 
nodes. 


So, for two arbitrary network endpoints, there can be no assurance 
that the DSCP set at the source endpoint will be preserved and 
presented at the destination endpoint. Rather, it is quite likely 
that the DSCP will be set to zero (e.g., at the boundary of a network 
operator that distrusts or does not use the DSCP field) or to a value 
deemed suitable by an ingress classifier for whatever network 5-tuple 
it carries. 


In addition, remarking may remove application-level distinctions in 
forwarding behavior - e.g., if multiple PHBs within an AF class are 
used to distinguish different types of frames within a video RTP 
stream, token-bucket-—based remarkers operating in color-blind mode 
(see [RFC2697] and [RFC2698] for examples) may remark solely based on 
flow rate and burst behavior, removing the drop precedence 
distinctions specified by the source. 


Backbone and other carrier networks may employ a small number of 
DSCPs (e.g., less than half a dozen) to manage a small number of 
traffic aggregates; hosts that use a larger number of DSCPs can 
expect to find that much of their intended differentiation is removed 
by such networks. Better results may be achieved when DSCPs are used 
to spread traffic among a smaller number of Diffserv-based traffic 
subsets or aggregates; see [DIFFSERV-INTERCON] for one proposal. 

This is of particular importance for MPLS-based networks due to the 
limited size of the Traffic Class (TC) field in an MPLS label 
[RFC5462] that is used to carry Diffserv information and the use of 
that TC field for other purposes, e.g., Explicit Congestion 
Notification (ECN) [RFC5129]. For further discussion on use of 
Diffserv with MPLS, see [RFC3270] and [RFC5127]. 
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4. 


Examples 


For real-time communications, one might want to mark the audio 
packets using EF and the video packets as AF41. However, a video 
conference receiving the audio packets significantly ahead of the 
video is not useful because lip sync is necessary between audio and 
video. It may still be desirable to send audio with a PHB that 
provides better service, because more reliable arrival of audio helps 
assure smooth audio rendering, which is often more important than 
fully faithful video rendering. There are also limits, as some 
devices have difficulties in synchronizing voice and video when 
packets that need to be rendered together arrive at significantly 
different times. It makes more sense to use different PHBs when the 
audio and video source streams do not share a strict timing 
relationship. For example, video content may be shared within a 
video conference via playback, perhaps of an unedited video clip that 
is intended to become part of a television advertisement. Such 
content sharing video does not need precise synchronization with 
video conference audio, and could use a different PHB, as content 
sharing video is more tolerant to jitter, loss, and delay. 


Within a layered video RTP stream, ordering of frame communication is 
preferred, but importance of frame types varies, making use of PHBs 
with different drop precedences appropriate. For example, I-frames 
that contain an entire image are usually more important than P-frames 
that contain only changes from the previous image because loss of a 
P-frame (or part thereof) can be recovered (at the latest) via the 
next I-frame, whereas loss of an I-frame (or part thereof) may cause 
rendering problems for all of the P-frames that depend on the missing 
I-frame. For this reason, it is appropriate to mark I-frame packets 
with a PHB that has lower drop precedence than the PHB used for 
P-frames, as long as the PHBs preserve ordering among frames (e.g., 
are in a single AF class) - AF41 for I-frames and AF43 for P-frames 
is one possibility. Additional spatial and temporal layers beyond 
the base video layer could also be marked with higher drop precedence 
than the base video layer, as their loss reduces video quality, but 
does not disrupt video rendering. 


Additional RTP streams in a real-time communication interaction could 
be marked with CSO and carried as best-effort traffic. One example 
is real-time text transmitted as specified in RFC 4103 [RFC4103]. 
Best-effort forwarding suffices because such real-time text has loose 
timing requirements; RFC 4103 recommends sending text in chunks every 
300 ms. Such text is technically real-time, but does not need a PHB 
promising better service than best effort, in contrast to audio or 
video. 
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5. 


A WebRTC application may use one or more RTP streams, as discussed 
above. In addition, it may use an SCTP-based data channel 
[DATA-CHAN] whose QoS treatment depends on the nature of the 
application. For example, best-effort treatment of data channels is 
likely to suffice for messaging, shared white board, and guided 
browsing applications, whereas latency-sensitive games might desire 
better QoS for their data channels. 


Diffserv Interactions 


.1. Diffserv, Reordering, and Transport Protocols 


Transport protocols provide data communication behaviors beyond those 
possible at the IP layer. An important example is that TCP [RFC793] 
provides reliable in-order delivery of data with congestion control. 
SCTP [RFC4960] provides additional properties such as preservation of 
message boundaries, and the ability to avoid head-of-line blocking 
that may occur with TCP. 


In contrast, UDP [RFC768] is a basic unreliable datagram protocol 
that provides port-—based multiplexing and demultiplexing on top of 
IP. Two other unreliable datagram protocols are UDP-Lite [RFC3828], 
a variant of UDP that may deliver partially corrupt payloads when 
errors occur, and DCCP [RFC4340], which provides a range of 
congestion control modes for its unreliable datagram service. 


Transport protocols that provide reliable delivery (e.g., TCP, SCTP) 
are sensitive to network reordering of traffic. When a protocol that 
provides reliable delivery receives a packet other than the next 
expected packet, the protocol usually assumes that the expected 
packet has been lost and updates the peer, which often causes a 
retransmission. In addition, congestion control functionality in 
transport protocols (including DCCP) usually infers congestion when 
packets are lost. This creates additional sensitivity to significant 
network packet reordering, as such reordering may be (mis) interpreted 
as loss of the out-of-order packets, causing a congestion control 
response. 


This sensitivity to reordering remains even when ECN [RFC3168] is in 
use, as ECN receivers are required to treat missing packets as 
potential indications of congestion, because: 


o Severe congestion may cause ECN-capable network nodes to drop 
packets, and 


o ECN traffic may be forwarded by network nodes that do not support 
ECN and hence drop packets to indicate congestion. 
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Congestion control is an important aspect of the Internet 
architecture; see [RFC2914] for further discussion. 


In general, marking packets with different DSCPs results in different 
PHBs being applied at nodes in the network, making reordering very 
likely due to use of different pools of forwarding resources for each 
PHB. This should not be done within a single network 5-tuple for 
current transport protocols, with the important exceptions of UDP and 
UDP-Lite. 


When PHBs that enable reordering are mixed within a single network 
5-tuple, the effect is to mix QoS-based traffic classes within the 
scope of a single transport protocol connection or association. As 
these QoS-based traffic classes receive different network QoS 
treatments, they use different pools of network resources and hence 
may exhibit different levels of congestion. The result for 
congestion-controlled protocols is that a separate instance of 
congestion control functionality is needed per QoS-based traffic 
class. Current transport protocols support only a single instance of 
congestion control functionality for an entire connection or 
association; extending that support to multiple instances would add 
significant protocol complexity. Traffic in different QoS-based 
classes may use different paths through the network; this complicates 
path integrity checking in connection- or association-—based 
protocols, as those paths may fail independently. 


The primary example where usage of multiple PHBs does not enable 
reordering within a single network 5-tuple is use of PHBs from a 
Single AF class (e.g., AF1x). Traffic reordering within the scope of 
a network 5-tuple that uses a single PHB or AF class may occur for 
other transient reasons (e.g., routing changes or ECMP rebalancing). 


Reordering also affects other forms of congestion control, such as 
techniques for RIP congestion control that were under development 
when this memo was published; see [RMCAT-CC] for requirements. These 
techniques prefer use of a common (coupled) congestion controller for 
RTP streams between the same endpoints to reduce packet loss and 
delay by reducing competition for resources at any shared bottleneck. 


Shared bottlenecks can be detected via techniques such as correlation 
of one-way delay measurements across RTP streams. An alternate 
approach is to assume that the set of packets on a single network 
5-tuple marked with DSCPs that do not enable reordering will utilize 
a common network path and common forwarding resources at each network 
node. Under that assumption, any bottleneck encountered by such 
packets is shared among all of them, making it safe to use a common 
(coupled) congestion controller (see [COUPLED-CC]). This is not a 
safe assumption when the packets involved are marked with DSCP values 


Black & Jones Informational [Page 14] 


RFC 7657 Diffserv and RT Communication November 2015 


that enable reordering because a bottleneck may not be shared among 
all such packets (e.g., when the DSCP values result in use of 
different queues at a network node, but only one queue is a 
bottleneck). 


UDP and UDP-Lite are not sensitive to reordering in the network, 
because they do not provide reliable delivery or congestion control. 
On the other hand, when used to encapsulate other protocols (e.g., as 
UDP is used by WebRTC; see Section 2.1), the reordering 
considerations for the encapsulated protocols apply. For the 
specific usage of UDP by WebRTC, every encapsulated protocol (i.e., 
RTP, SCTP, and TCP) is sensitive to reordering as further discussed 
in this memo. In addition, [RFC5405] provides general guidelines for 
use of UDP (and UDP-Lite); the congestion control guidelines in that 
document apply to protocols encapsulated in UDP (or UDP-Lite). 


5.2. Diffserv, Reordering, and Real-Time Communication 


Real-time communications are also sensitive to network reordering of 
packets. Such reordering may lead to unneeded retransmission and 
spurious retransmission control signals (such as NACK) in reliable 
delivery protocols (see Section 5.1). The degree of sensitivity 
depends on protocol or stream timers, in contrast to reliable 
delivery protocols that usually react to all reordering. 


Receiver jitter buffers have important roles in the effect of 
reordering on real-time communications: 


o Minor packet reordering that is contained within a jitter buffer 
usually has no effect on rendering of the received RTP stream 
because packets that arrive out of order are retrieved in order 
from the jitter buffer for rendering. 


o Packet reordering that exceeds the capacity of a jitter buffer can 
cause user-perceptible quality problems (e.g., glitches, noise) 
for delay-sensitive communication, such as interactive 
conversations for which small jitter buffers are necessary to 
preserve human perceptions of real-time interaction. Interactive 
real-time communication implementations often discard data that is 
sufficiently late so that it cannot be rendered in source stream 
order, making retransmission counterproductive. For this reason, 
implementations of interactive real-time communication often do 
not use retransmission. 


o In contrast, replay of recorded media can tolerate significantly 
longer delays than interactive conversations, so replay is likely 
to use larger jitter buffers than interactive conversations. 
These larger jitter buffers increase the tolerance of replay to 
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reordering by comparison to interactive conversations. The size 
of the jitter buffer imposes an upper bound on replay tolerance to 
reordering but does enable retransmission to be used when the 
jitter buffer is significantly larger than the amount of data that 
can be expected to arrive during the round-trip latency for 
retransmission. 


Network packet reordering has no effective upper bound and can exceed 
the size of any reasonable jitter buffer. In practice, the size of 
jitter buffers for replay is limited by external factors such as the 
amount of time that a human is willing to wait for replay to start. 


5.3. Drop Precedence and Transport Protocols 


Packets within the same network 5-tuple that use PHBs within a single 
AF class can be expected to draw upon the same forwarding resources 
on network nodes (e.g., use the same router queue), and hence use of 
multiple drop precedences within an AF class is not expected to cause 
latency variation. When PHBs within a single AF class are mixed 
within a flow, the resulting overall likelihood that packets will be 
dropped from that flow is a mix of the drop likelihoods of the PHBs 
involved. 


There are situations in which drop precedences should not be mixed. 
A simple example is that there is little value in mixing drop 
precedences within a TCP connection, because TCP’s ordered delivery 
behavior results in any drop requiring the receiver to wait for the 
dropped packet to be retransmitted. Any resulting delay depends on 
the RIT and not the packet that was dropped. Hence a single DSCP 
should be used for all packets in a TCP connection. 


As a consequence, when TCP is selected for NAT/FW traversal (e.g., by 
TURN), a single DSCP should be used for all traffic on that TCP 
connection. An additional reason for this recommendation is that 
packetization for STUN/ICE/TURN occurs before passing the resulting 
packets to TCP; TCP resegmentation may result in a different 
packetization on the wire, breaking any association between DSCPs and 
specific data to which they are intended to apply. 


SCTP [RFC4960] differs from TCP in a number of ways, including the 
ability to deliver messages in an order that differs from the order 
in which they were sent and support for unreliable streams. However, 
SCTP performs congestion control and retransmission across the entire 
association, and not on a per-stream basis. Although there may be 
advantages to using multiple drop precedence across SCTP streams or 
within an SCTP stream that does not use reliable ordered delivery, 
there is no practical operational experience in doing so (e.g., the 
SCTP sockets API [RFC6458] does not support use of more than one DSCP 
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for an SCTP association). As a consequence, the impacts on SCTP 
protocol and implementation behavior are unknown and difficult to 
predict. Hence a single DSCP should be used for all packets in an 
SCTP association, independent of the number or nature of streams in 
that association. Similar reasoning applies to a DCCP connection; a 
single DSCP should be used because the scope of congestion control is 
the connection and there is no operational experience with using more 
than one DSCP. This recommendation may be revised in the future if 
experiments, analysis, and operational experience provide compelling 
reasons to change it. 


Guidance on transport protocol design and implementation to provide 
support for use of multiple PHBs and DSCPs in a transport protocol 
connection (e.g., DCCP) or transport protocol association (e.g., 
SCTP) is out of scope for this memo. 


5.4. Diffserv and RTCP 


RTCP [RFC3550] is used with RTP to monitor quality of service and 
convey information about RTP session participants. A sender of RTCP 
packets that also sends RTP packets (i.e., originates an RTP stream) 
should use the same DSCP marking for both types of packets. If an 
RTCP sender doesn’t send any RTP packets, it should mark its RTCP 
packets with the DSCP that it would use if it did send RTP packets 
with media similar to the RTP traffic that it receives. If the RTCP 
sender uses or would use multiple DSCPs that differ only in drop 
precedence for RTP, then it should use the DSCP with the least 
likelihood of drop for RTCP to increase the likelihood of RTCP packet 
delivery. 


If the SDP bundle extension [SDP-BUNDLE] is used to negotiate sending 
multiple types of media in a single RTP session, then receivers will 
send separate RTCP reports for each type of media, using a separate 
SSRC for each media type; each RTCP report should be marked with the 
DSCP corresponding to the type of media handled by the reporting 
SSRC. 


This guidance may result in different DSCP markings for RTP streams 
and RTCP receiver reports about those RTP streams. The resulting 
variation in network QoS treatment by traffic direction is necessary 
to obtain representative round-trip time (RTT) estimates that 
correspond to the media path RTT, which may differ from the transport 
protocol RTT. RTCP receiver reports may be relatively infrequent, 
and hence the resulting RTT estimates are of limited utility for 
transport protocol congestion control (although those RIT estimates 
have other important uses; see [RFC3550]). For this reason, it is 
important that RTCP receiver reports sent by an SSRC receive the same 
network QoS treatment as the RTP stream being sent by that SSRC. 
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6. 


Guidelines 


The only use of multiple standardized PHBs and DSCPs that does not 
enable network reordering among packets marked with different DSCPs 
is use of PHBs within a single AF class. All other uses of multiple 
PHBs and/or the class selector DSCPs enable network reordering of 
packets that are marked with different DSCPs. Based on this and the 
foregoing discussion, the guidelines in this section apply to use of 
Diffserv with real-time communications. 


Applications and other traffic sources (including RTP SSRCs): 


o Should limit use of DSCPs within a single RTP stream to those 
whose corresponding PHBs do not enable packet reordering. If this 
is not done, significant network reordering may overwhelm 
implementation assumptions about reordering limits, e.g., jitter 
buffer size, causing poor user experiences (see Section 5.2). 

This guideline applies to all of the RTP streams that are within 
the scope of a common (coupled) congestion controller when that 
controller does not use per-RTP-stream measurements for bottleneck 
detection. 


o Should use a single DSCP for RTCP packets, which should be a DSCP 
used for RTP packets that are or would be sent by that SSRC (see 
Section 5.4). 


o Should use a single DSCP for all packets within a reliable 
transport protocol session (e.g., TCP connection, SCTP 
association) or DCCP connection (see Sections 5.1 and 5.3). For 
SCTP, this requirement applies across the entire SCTP association, 
and not just to individual streams within an association. When 
TURN selects TCP for NAT/FW traversal, this guideline applies to 
all traffic multiplexed onto that TCP connection, in contrast to 
use of UDP for NAT/FW traversal. 


o May use different DSCPs whose corresponding PHBs enable reordering 
within a single UDP or UDP-Lite 5-tuple, subject to the above 
constraints. The service differentiation provided by such usage 
is unreliable, as it may be removed or changed by DSCP remarking 
at network boundaries as described in Section 3.2 above. 


o Cannot rely on end-to-end preservation of DSCPs as network node 
remarking can change DSCPs and remove drop precedence distinctions 
(see Section 3.2). For example, if a source uses drop precedence 
distinctions within an AF class to identify different types of 
video frames, using those DSCP values at the receiver to identify 
frame type is inherently unreliable. 
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o Should limit use of the CS1 codepoint to traffic for which best 
effort forwarding is acceptable, as network support for use of CS1 
to select a "less than best-effort" PHB is inconsistent. Further, 
some networks may treat CS1 as providing "better than best-effort" 
forwarding behavior. 


There is no guidance in this memo on how network operators should 
differentiate traffic. Networks may support all of the PHBs 
discussed herein, classify EF and AFxx traffic identically, or even 
remark all traffic to best effort at some ingress points. 
Nonetheless, it is useful for applications and other traffic sources 
to provide finer granularity DSCP marking on packets for the benefit 
of networks that offer QoS service differentiation. A specific 
example is that traffic originating from a browser may benefit from 
QoS service differentiation in within-building and residential access 
networks, even if the DSCP marking is subsequently removed or 
simplified. This is because such networks and the boundaries between 
them are likely traffic bottleneck locations (e.g., due to customer 
aggregation onto common links and/or speed differences among links 
used by the same traffic). 


7. Security Considerations 


The security considerations for all of the technologies discussed in 
this memo apply; in particular, see the security considerations for 
RTP in [RFC3550] and Diffserv in [RFC2474] and [RFC2475]. 


Multiplexing of multiple protocols onto a single UDP 5-tuple via 
encapsulation has implications for network functionality that 
monitors or inspects individual protocol flows, e.g., firewalls and 
traffic monitoring systems. When implementations of such 
functionality lack visibility into encapsulated traffic (likely for 
many current implementations), it may be difficult or impossible to 
apply network security policy and associated controls at a finer 
granularity than the overall UDP 5-tuple. 


Use of multiple DSCPs that enable reordering within an overall real- 
time communication interaction enlarges the set of network forwarding 
resources used by that interaction, thereby increasing exposure to 
resource depletion or failure, independent of whether the underlying 
cause is benign or malicious. This represents an increase in the 
effective attack surface of the interaction and is a consideration in 
selecting an appropriate degree of QoS differentiation among the 


components of the real-time communication interaction. See 
Section 3.3.2.1 of [RFC6274] for related discussion of DSCP security 
considerations. 
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8. 


8. 


Use of multiple DSCPs to provide differentiated QoS service may 
reveal information about the encrypted traffic to which different 
service levels are provided. For example, DSCP-based identification 
of RTP streams combined with packet frequency and packet size could 
reveal the type or nature of the encrypted source streams. The IP 
header used for forwarding has to be unencrypted for obvious reasons, 
and the DSCP likewise has to be unencrypted to enable different IP 
forwarding behaviors to be applied to different packets. The nature 
of encrypted traffic components can be disguised via encrypted dummy 
data padding and encrypted dummy packets, e.g., see the discussion of 
traffic flow confidentiality in [RFC4303]. Encrypted dummy packets 
could even be added in a fashion that an observer of the overall 
encrypted traffic might mistake for another encrypted RTP stream. 
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